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Abstract 

The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been 
sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax 
was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera 
carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The 
analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned 
above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree 
of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear 
counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive 
correlation between the A+T content of zero-fold degenerate sites (Pofd)/ the ratio of nucleotide substitution frequency at 
P 0 fd sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and 
amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T 
content of four-fold degenerate sites (P 4F d) and the ratio of nucleotide substitution frequency at P 4FD sites to all degenerate 
sites; however, we found significant negative correlation between ASD and the A+T content of P 4FDi and the ratio of 
nucleotide substitution frequency at P 4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non- 
synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect 
mitogenome. A poly(T) stretch at the 5' end of the CR followed by a [TA(A)] n -like stretch was also found. In addition, a highly 
conserved G+A-rich sequence block was observed in front of the poly(T) stretch among the ten tephritid species and two 
tandem repeats were present in the CR. 
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Introduction 

The family Tephritidae, generally known as "true" fruit flies, 
includes 471 genera and 4257 species distributed throughout the 
temperate and tropical areas of the world. Many species are of 
critical importance to man either as pests of fruit and vegetable 
crops or as beneficial species for the control of weeds [1]. The fruit 
fly Bactrocera minax Enderlein (Diptera: Tephritidae), generally 
known as the Chinese citrus fruit fly, has been a serious pest of 
commercial citrus crops in China for more than half a century [2] . 
This species has been recorded in southern China, India (West 
Bengal and Sikkim) and Bhutan [2,3] wild and cultivated citrus 
species [4]. Some hosts are endemic to southern China and the 
eastern Himalayan region [5] but B. minax has been reported on 
the kumquat Fortunella crassifolia [6] and the boxthorn Lycium 
chinense [2]. 

B. minax was first collected from India and Sikkim and 
designated B. minax Enderlein [1]. Drew [3] provided a detailed 



description and illustration of the B. minax type specimens collected 
in 1920 and assigned the species to the genus Bactrocera 
(Polistomimetes). White and Wang [7] designated a lectotype of B. 
minax and assigned the species to the Bactrocera (Tetadacus); in 
addition, they indicated that Bactrocera citri Chen, collected from 
China in 1940, should be placed in synonymy with B. minax. 

A wide variety of questions about the biology and phylogeny of 
B. minax have been addressed with the aid of molecular tools. 
These studies could have used two main sources of genetic data; 
namely, nuclear sequence data and, most frequently, mitochon- 
drial sequence data. Insect mitochondrial DNA (mtDNA) usually 
occurs as a double-stranded closed circular molecule, ranging in 
size from 14—20 kb and generally encoding 13 protein-coding 
genes (PCGs), two ribosomal RNAs (rRNAs) and 22 transfer RNA 
(tRNAs), which is conserved across bilaterian metazoans with only 
a few exceptions (e.g. loss of a small number of genes in some 
derived groups) [8] . The molecule contains at least one sequence 
of variable length known as the A+T-rich region or control region 
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Figure 1. Circular map of the mitogenome of B. minax. The genes located outside adjoined the bold line circle (J-strand) indicated that the 
direction of transcription is opposite to the genes located inside adjoined the bold line circle (N-strand). 6. minax complete mitogenome was jointed 
using 21 (F1-F21) fragments shown as single lines within the bold line circle. 
doi:10.1371/journal.pone.0100558.g001 



(CR), which contains initiation sites for transcription and 
replication [9] and ranges in size from tens to several thousands 
of base pairs [10-13]. As the results of highly conservative gene 
structures among phyla, maternal inheritance, high copy number 
and relatively fast evolution rates compared to nuclear DNA [14], 
mitochondrial genome (mitogenome) sequences have been re- 
garded as useful molecular markers in studies focusing on 
comparative and evolutionary genomics, molecular evolution, 
phylogenetics, phylogeography and population genetics [15]. 

Many complete or nearly complete mitogenomes have been 
sequenced and comparative analyses at the genus or species level 



have used multiple complete mitochondrial genes instead of one or 
partial genes, including molecular systematics [16-20], population 
genetics/phylogeography [16], diagnostics [21], molecular evolu- 
tionary studies [13,22,23], the frequency and type of gene 
rearrangements [24,25] and the evolution of genome size [26]. 
To date, more than 500 insect mitogenomes have been sequenced 
from all orders, including 77 dipterans in 24 families, and are 
available in Genbank. In this study, we sequenced the complete 
sequence of the mitogenome of A minax (Diptera: Tephritidae). 

Genbank contains information for only ten Tephritidae species; 
Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera 
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Table 2. Summary of B. minax mitogenome. 





Gene 


Direction 


Location 


Size 


IGS 


Anticodon 


Start code 


Stop code 


trnl 


F 


1-65 


65 


0 


GAT 






trnQ 


R 


66-134 


69 


10 


TTG 






trnM 


F 


145-213 


69 


-1 


CAT 






nad2 


F 


213-1235 


1023 


8 




ATT 


TAG 


trnW 


F 


1244-1311 


68 


-8 


TCA 






trnC 


R 


1304-1365 


62 


42 


GCA 






trnY 


R 


1408-1475 


68 


-2 


GTA 






cox] 


F 


1474-3009 


1536 


-1 




TCG 


TAT 


tmL' um 


F 


3009-3072 


65 


5 


TAA 






cox2 


F 


3078-3764 


687 


6 




ATG 


TAA 


trnK 


F 


3771-3841 


71 


-1 


CTT 






trnD 


F 


3841-3908 


68 


0 


GTC 






atp8 


F 


3909-4070 


162 


-7 




ATT 


TAA 


atp6 


F 


4064-4741 


678 


-1 




ATG 


TAG 


cox3 


F 


4741-5532 


792 


6 




ATG 


TAA 


trnG 


F 


5539-5604 


66 


0 


TCC 






nad3 


F 


5605-5956 


352 


0 




GTC 


T 


trnA 


F 


5957-6021 


65 


5 


TGC 






trnR 


F 


6027-6090 


64 


28 


TCG 






trnN 


F 


6119-6183 


65 


0 


GTT 






tmS (AGN, 


F 


6184-6251 


68 


2 


GCT 






trnE 


F 


6254-6319 


66 


18 


TTC 






trnF 


R 


6338-6403 


66 


-1 


GAA 






nad5 


R 


6403-8122 


1720 


14 




ATT 


T 


trnH 


R 


8137-8201 


65 


4 


GTG 






nad4 


R 


8206-9546 


1341 


-17 




ATG 


TAA 


nad4l 


R 


9530-9826 


297 


2 




ATG 


TAA 


trnT 


F 


9829-9893 


65 


0 


TGT 






trnP 


R 


9894-9959 


66 


2 


TGG 






nad6 


F 


9962-10483 


522 


-1 




ATG 


TAA 


cob 


F 


10483-11619 


1137 


-2 




ATG 


TAG 


trnS IUCN > 


F 


11618-11684 


67 


16 


TGA 






nadl 


R 


11701-12640 


940 


10 




ATA 


T 


tmL KUN, 


R 


12651-12716 


66 


0 


TAG 






rrnL 


R 


12717-14049 


1333 


-1 








trnV 


R 


14049-14120 


72 


0 


TAC 






rrnS 


R 


14121-14902 


782 


0 








CR 




14903-16043 


1141 


0 









doi:1 0.1 371 /journal.pone.01 00558.t002 

carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, 
Bactrocera cucurbitae, Ceratitis capitata and B. minax. Nine of these 
species belong to the genus Bactrocera, including four species of the 
B. dorsalis species complex; the other species belongs to the genus 
Ceratitis. Within the nine Bactrocera species, B. philippinensis, B. 
carambolae, B. papayae and B. dorsalis belong to the B. dorsalis species 
complex, B. correcta, B. cucurbitae and B. tryoni belong to other 
species-groups within the subgenus Bactrocera, and B. oleae and B. 
minax belong to the subgenus Daculus and Tetradacus, respectively. 
Although recent molecular evidence suggests B. papaya, B. 
philippinensis and B. dorsalis likely represent one species [27-30], 



with anticipation of the analysis of the B. minax mitogenome, we 
compare the sequence and mitogenome origins to the tephritid 
species B. oleae, B. dorsalis, B. philippinensis, B. carambolae, B. papayae, 
B. correcta, B. cucurbitae B. tryoni and C. capitata. 

Materials and Methods 

1. Insect and mtDNA extraction, protein-coding genes 
and sequencing 

We collected B. minax adults from a citrus garden on private 
land at Xianli Zeng covering an area of 20 hectares in Wulong 
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Table 3. Length and base composition of different genomic regions in 10 tephritid species, B. oleae, B. tryoni, B. philippinensis, B. 
carambolae, B. papaya, B. dorsalis, C. capitata, B. minax, B. correcta and B. curcubitae. 





Accession No. and speices 


Whole mtDNA 


PCGs 




tRNAs 




rRNAs 




CR 






Size 


(A+T)% 


Size 


(A+T)% 


Size 


(A+T)% 


Size 


(A+T)% 


Size 


(A+T)% 


A Y2 10702 fi. oleae 


15815 


72.6 


11188 


70.1 


1484 


75.1 


2116 


77.1 


949 


86.9 


HQ1 30030 fi. tryoni 


15925 


72.5 


11186 


69.6 


1467 


75.0 


2115 


77.7 


951 


87.0 


DQ995281 fi. philippinensis 


15915 


73.6 


11192 


71.1 


1466 


75.3 


2114 


77.7 


949 


88.2 


EF014414 B. carambolae 


15915 


73.6 


11190 


71.2 


1466 


75.1 


2113 


77.6 


950 


87.9 


DQ917578 fi. papayae 


15915 


73.5 


11190 


71.0 


1465 


75.1 


2114 


77.7 


950 


88.2 


DQ 845759 fi. dorsalis 


15915 


73.6 


11185 


71.2 


1467 


75.2 


2123 


77.8 


949 


88.1 


AJ242872 C. capitata 


15980 


77.5 


11272 


75.5 


1472 


76.8 


2123 


80.2 


1004 


91.1 


HM776033 B. minax 


16043 


67.3 


11187 


64.3 


1466 


72.2 


2115 


73.7 


1141 


77.6 


JX456552 fi. correcta 


15936 


73.2 


11192 


71.2 


1470 


75.3 


2117 


77.9 


949 


78.6 


JN635562 S. curcubitae 


15825 


72.8 


11190 


70.7 


1467 


75.1 


2110 


77.8 


946 


82.3 
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(Chongqing Province, China). We confirm that Mr Zeng, the 
owner of this land, allowed us to conduct the study on this site. No 
specific permission was required for this location and our activity. 
We confirm the field studies did not involve endangered or 
protected species. B. minax adults were stored at 25°C in 99% (v/v) 
ethanol. Morphological identification was done according to 
White and Wang [7]. Total DNA was isolated from three adult 
specimens using the DNeasy Blood & Tissue kit (QIAGEN) 
according to the manufacturer's instructions. The whole B. minax 
mitogenome sequence was assembled from a single individual 
(three repeats). Purified total DNA was used as a template for 
amplification of the entire B. minax mitogenome in 2 1 overlapping 
pieces, ranging in size from 388 bp to 1 762 bp. PCR primers were 
designed as described [31] and by comparison to the available 
sequences of B. oleae, B. dorsalis, B. philippinensis, B. carambolae, B. 
papayae, B. correcta, B. cucurbitae B. tryoni and C. capitata (Table 1). 
Amplification was done in a thermocycler (Eppendorf Mastercy- 
cler 5333) in 50 |il reactions containing 5 (0,1 of 25 mM MgCl 2 , 
5 ul of lOxPCR Buffer (Mg 2+ free), 8 ul of a dNTP mixture 
(2.5 mM each), 3 ul of 10 uM each primer, 0.5 (xl of 5 U/ul Tag 
polymerase (Takara Biomedical, Japan) and 2 ul of a 1/10 dilution 
of the DNA extract. Amplification conditions were: 5' of pre-PCR 
denaturation at 94°C followed by 34 cycles of 30 s at 94°C, 1 min 
at 40-58°C (depending on the primer pair) and 2 min at 72°C. 
The F21 fragment (Fig. 1) was amplified using LA Tag (Takara 
Biomedical, Japan) and a cycle consisting of a pre-PCR 
denaturation at 96°C for 2 min followed by 30 cycles of 10 s at 
98°C and 2 min at 58°C with a final elongation step of 10 min at 
72°C. PCR products were separated by electrophoresis and 
purified using a QIAquick Gel Extraction Kit (QIAGEN). PCR 
products were sequenced direcdy on both strands using amplifi- 
cation and additional ad hoc primers as needed. Individual 
sequences were combined in a consensus contig using DNAStar 
package software (DNAStar Inc.). 

2. Sequence analysis and gene annotation 

Genes encoded on the B. minax mitogenome were located 
initially by comparison to homologous full-length insect mito- 
chondrial sequences using DNAStar. Nucleotide sequences of 
PCGs were translated using the invertebrate mtDNA genetic code. 
tRNA genes were identified initially using tRNAscan-SE Search 
Server version 1.21 (available online at http://lowelab.ucsc.edu/ 



tRNAscan-SE/) [32] and refined using tRNAscan-SE and 
RNAshapes [33]. The presence and secondary structures of tRNA 
genes that could not be located by tRNAscan-SE owing to variant 
morphology were annotated manually by comparison to the 
sequences of other insect tRNAs [34—37]. Codon usage analysis 
and relative synonymous codon usage (RSCU) in PCGs were 
calculated using CodonW version 1.4.2 John Peden, available at 
http://codonw.sourceforge.net/index.html) [38]. Potential sec- 
ondary structure folds of non-coding sequences and sequences in 
the CR were calculated with the DNA mfold web server using 
default settings (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-forml. 
cgi) [39]. The presence of tandem repeats in the CR was 
investigated using the Tandem Repeats Finder available online 
(http://tandem.bu.edu/trf/trf.html) [40]. The A+T content and 
nucleotide substitution frequency at synonymous sites and non- 
synonymous sites (the number of synonymous substitutions per site 
and the number of non-synonymous substitutions per site) were 
calculated on the basis of the data using MEGA 4.0 [41]. The 
correlation analysis was done by the bivariate method using SPSS 
version 13 (SPSS Inc., Chicago, IL). The overall average amino 
acid distance among each of the PCGs from ten tephritid species 
(B. minax, B. oleae, B. tryoni B. dorsalis B. philippinensis, B. carambolae, B. 
papayae, B. correcta, B. cucurbitae and C. capitata) were calculated by 
the method of Poisson distances by MEGA 4.0 [41]. The complete 
B. minax mtDNA sequence was deposited in Genbank under 
accession no. HM776033. 

Results and Discussion 

1. Genome organization 

The mitochondrial genome of B. minax is a closed circular 
molecule of 16043 bp; hence, it is longer than the other nine 
tephritid mitogenomes available (range 15,815 bp in B. oleae to 
15,980 bp in C. capitata) but is still well within the range of other 
insect mitogenomes (14,503 bp in Rhopalomyia pomum [42] to 
19517 bp in Drosophila melanogaster [11]). The gene content is 
typical of metazoan mitogenomes, with 13 PCGs (coxl-3, cob, nadl- 
6, nad4l, atp6 and atpS), 22 tRNAs and two genes for ribosomal 
RNA subunits (rrnS and rmE). A long uninterrupted non-coding 
region of 1141 bp, likely homologous to the insect A+T-rich 
region, is present between rrnS and tml, corresponding to position 
14,903 to 16,043 in the annotated sequence. The gene order in the 
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Figure 2. The AT content percentage of 0-fold degenerate sites, 2-fold degenerate sites and 4-fold degenerate sites in each protein- 
coding gene of mitochondrial genomes of 10 tephritid species, B. oleae, B. tryoni, B. philippinensis, B. carambolae, B. papaya, B. dorsalis, 
C. capitata, B. minax, B. correcta and B. curcubitae. The black line with short line on the top of each bar represents the standard deviation value 
(SD). 

doi:10.1371/journal.pone.0100558.g002 



B. minax mitogenome corresponds to the typical and plesiomorphic 
state hypothesized for the Pancrustacea, and is shared with all 
tephritids analyzed to date (Fig. 1). 

Genes in the B. minax mitogenome overlap by a total of 43 bp, 
distributed in 1 2 segments from 1 to 1 7 bp long and are separated 
by a total of 178 bp dispersed in 16 intergenic spacers from 2 to 
42 bp (without taking the tRNA-like sequence into account; 
Table 2). Despite its relatively large size, the B. minax mitogenome 
has more overlapping sequences between genes compared to those 
of other tephritids; genes overlap by a total of 35 bp at 11 
boundaries in B. oleae, 29 bp in seven locations in B. tryoni, 27 bp in 
five locations in B. dorsalis, 34 bp in ten locations in B. philippinensis, 
32 bp in nine locations in B. carambolae, 34 bp in ten locations in B. 
papayae, 35 bp in 1 1 locations in B. correcta, 32 bp in nine locations 
in B. cucurbitae and only 3 bp at three boundaries in C. capitata. 

2. Nucleotide composition 

The overall base composition of A minax is 38.0% A, 1 1.2% G, 
29.3% T and 21.5% C. Similar to other insect sequences, the B. 
minax mitogenome nucleotide composition is biased toward 
adenine and thymine (67.3% A+T), which is the lowest value 
among the tephritid mitogenomes available. Analyzed separately, 



all PCGs (64.3%), tRNAs (72.2%), sRNAs (73.7%) and CR 
(77.6%) have the lowest A+T content compared to the other 
known tephritid mitogenomes (Table 3). 

Considering the two strands separately, the PCGs on the 
Majority strand (J-strand, nine PCGs are located on this strand) 
(61.5%) have a lower A+T content compared to the Minority 
strand (N-strand, the other four PCGs are located on this strand) 
(68.9%). Furthermore, PCGs encoded on the J-strand have a 
comparable content of A (31.0%) and T (30.5%), whereas PCGs 
on the N-strand show a strong bias for T content (46.3%) 
compared to A content (22.6% A). The above situation has been 
observed in the other tephritid mitogenomes available (data not 
shown) and in other insects [34—37,43-50]. However, tRNAs on 
the two opposite strands have nearly equal A+T contents, which 
has been found in the other nine tephritid species. For three PCG 
codon positions, the third codon positions have significantly higher 
A+T content than the first and second codon positions owing to 
genetic code degeneracies. In particular, T in each codon position 
of PCGs on the N-strand is over-represented. With exception of 
the second codon position over-representing T, however, the first 
and third codon positions of PCGs show a preponderance of A on 




■ 0 -f old de genera te site s 

■ 2-fold degenerate sites 
4-fold degenerate sites 



Protein-coding genes 



Figure 3. The nucleotide substitution frequency at O-fold degenerate sites, 2-fold degenerate sites and 4-fold degenerate sites in 
each protein-coding gene of mitochondrial genomes of 10 tephritid species, B. oleae, B. tryoni, B. philippinensis, B. carambolae, B. 
papaya, B. dorsalis, C. capitata, B. minax, B. correcta and B. curcubitae. 

doi:10.1371/journal.pone.0100558.g003 
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Table 5. A+T content percentage and nucleotide substitution frequency at 0-fold degenerate sites (P 0 fd)> 2-fold degenerate sites 
(P2fd) and 4-fold degenerate sites (P 4F d) (the number of substitutions per P 0FD , P 2 fd and P 4FD site)in each PCG of mtgenome of 10 
tephritid species, B. oleae, B. tryoni, B. philippinensis, B. carambolae, B. papaya, B. dorsalis, C. capitata, B. minax, B. correcta and B. 
curcubitae. 



Protein-coding genes 


PoFD 




P2FD 




P4FD 






A+T percentage 

(%) 


nucleotide 

substitution 

frequency 


A+T percentage 

(%) 


nucleotide 

substitution 

frequency 


A+T percentage 

(%) 


nucleotide 

substitution 

frequency 


nad2 


70.96 ±1.41 


0.198 


74.61 ±8.66 


0.811 


74.33 ±10.99 


1.689 


coxl 


56.53±0.21 


0.025 


73.31 ±8.55 


0.743 


84.93 ±7.82 


1.409 


cox2 


59.71 ±0.74 


0.074 


76.92±7.67 


0.624 


82.47 ±8.64 


1.306 


atp8 


68.92 ±0.81 


0.235 


81.54±8.66 


0.577 


83.33 ±11. 86 


1.400 


atp6 


68.76±2.92 


0.459 


63.36.±2.91 


0.227 


62.46 ±6.84 


0.590 


cox3 


57.46±0.36 


0.034 


75.10±5.61 


0.655 


87.34 ±4.95 


1.266 


nad3 


68.38±1.05 


0.192 


76.61 ±10.74 


0.831 


82.12±8.50 


1.485 


nad6 


72.59±1.44 


0.265 


80.12±8.60 


0.655 


87.68 ±4.1 1 


1.326 


cob 


6 1.62 ±0.42 


0.057 


68.66±9.26 


0.741 


81.14±6.72 


1.417 


nodi 


65.54±0.53 


0.094 


88.33±2.51 


0.381 


73.72 ±1.09 


1.090 


nad4l 


71.44±1.44 


0.107 


87.94±4.12 


0.397 


83.46±8.32 


1.154 


nad4 


76.42±3.72 


0.783 


77.49 ±1.86 


0.172 


41 .72 ±0.73 


0.004 


nad5 


66.33 ±1.23 


0.172 


85.77±4.83 


0.313 


85.18±5.45 


1.234 


Correlation coefficient (r) 


0.735 




-0.217 




0.864 




Confidence probability (P) 


0.004 -C0.01 




0.477>0.05 




0.000<0.01 




Note: the correlation analysis used Pearson coefficient under two-tailed test of significance. 



doi:1 0.1 371 /joumal.pone.01 00558.t005 



the J-strand and T on the N-strand, which is similar to many insect 
mitogenomes [34-37,43-50] (Table 3). 

The base compositional bias for A+T in PCGs is reflected in the 
relative synonymous codon usage statistics of the B. minax 
mitogenome (Table 4). With the exception of amino acid His, 
codons with A or T in the third codon position are generally 
strongly over-represented compared to codons terminating with 
either G or C. The ratio of G+C-rich (Pro, Ala, Arg and Gly) 
codons to A+T-rich codons (Phe, He, Met, Tyr, Asn and Lys) in B. 
minax PCGs was 0.44, which is higher compared to the other nine 
tephritids B. dorsalis (0.29), B. philippinensis (0.29), B. carambolae 
(0.30), B.papayae (0.29), B. correcta (0.30), B. cucurbitae (0.32), B. oleae 
(0.31), B. tryoni (0.32) and C. capitata (0.23). This demonstrates the 
amino acid composition is affected by the lower A+T mutational 
bias in B. minax (67.3%) and the stronger A+T mutational bias in 
B. dorsalis (73.6%), B. philippinensis (73.6%), B. carambolae (73.6%), B. 
papayae (73.5%), B. correcta (73.2%), B. cucurbitae (72.8%), B. oleae 
(72.6%), B. tryoni (72.4%) and C. capitata (77.5%). 

With the exception of first codon positions, G is under- 
represented compared to C in coding genes on the J-strand (PCGs, 
tRNAs, CR and intergenic nucleotides), while the G content is 
higher compared to C in coding genes on the N-strand (PCGs, 
tRNAs and rRNAs). This base compositional bias is in line with 
the general trend in the mitogenome toward a lower G content 
[51]. 

Base compositional heterogeneity and among-site rate variation 
(ASRV) are known to affect phylogenetic inference, resulting in 
the identification of incorrect phylogenetic relationships [52]. The 
easiest solution is simply to avoid non-stationary genes [53] but 
most earlier studies used relatively intuitive mitogenome data 
partitioning schemes, including by gene type (PCG, rRNA and 
tRNA), by gene, by codon position, by codon and gene, or by the 



strand on which the coding gene is located [15]. Inevitably, 
different intuitive partitioning schemes can each result in strong 
conflicting topologies, especially at deeper phylogenetic levels 
[25,54,55]. Therefore, selection of stationary, reversible composi- 
tional homogeneous is vital for reliable phylogenetic inference 
[52,56]. 

Many earlier studies were focused on the A+T content of 
different genes or regions to investigate the base compositional 
heterogeneity and among-site rate variation ASRV [57]. For 
mitogenomes, composition bias of A+T content was verified in 
most earlier studies; e.g. A+T content was usually over-represented 
in non-coding regions [58] and the third codon position generally 
had stronger A+T composition bias compared to the other two 
codon positions [59] etc.. We asked how variability between PCGs 
is related to underlying A+T content and its distribution across 
synonymous and non-synonymous sites. 

In this study, the A+T content of zero-fold sites (Pofd), two-fold 
(?2Fd) an d four-fold degenerate sites (P4fd) was determined for 
each of the PCGs from ten tephritid species (B. minax, B. oleae, B. 
tryoni B. dorsalis B. philippinensis, B. carambolae, B. papayae, B. correcta, 
B. cucurbitae and C. capitata) (Fig. 2). Nucleotide substitution 
frequency was calculated in Pofd, ?2fd an d P4FD f° r each of the 
PCGs among five tephritid species (Fig. 3). After analyzing the 
correlation between A+T content and nucleotide substitution 
frequency for each of the PCGs, we found a significant positive 
correlation between A+T content percentage of zero-fold degen- 
erate sites (AT 0F ) and nucleotide substitution frequency at Pofd 
(r= 0.735, P— 0.004) as well as between A+T content percentage 
of four-fold degenerate sites (AT 4F ) and nucleotide substitution 
frequency at P 4FD (r= 0.864, P= 0.000) (Table 5). Correlation 
analysis indicated there is a significant positive correlation between 
AT 0F and ASD (r= 0.752, P= 0.003), ASD and the nucleotide 
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Figure 4. Predicated secondary clover-leaf structures for the 22 tRNA genes of B. minax. The tRNAs are labled with abbreviation of their 
corresponding amino acids below each tRNA gene structure. Arms of tRNAs (clockwise from top) are the amino acid acceptor arm, TTC arm, the 
anticodon arm, and dihydrouridine (DHU) arm. (A) J-strand coding tRNAs. (B) N-strand coding tRNAs. 
doi:1 0.1 371 /journal.pone.01 00558.g004 



substitution number of zero-fold degenerate sites/ the nucleotide 
substitution number of all degenerate sites (RoF/aii) (''=0.983, 
P = 0.000), AT 0F and Ro F /all (r= 0.760, P= 0.003) (Table 6). 
Interestingly, the significant positive correlation was observed 
between AT 4F and the nucleotide substitution number of four-fold 
degenerate sites/ the nucleotide substitution number of all degen- 
erate sites (R 4F / a ii) () = 0.809, P— 0.001); however, there was 
significant negative correlation between AT 4F and ASD (r= — 
0.828, P= 0.000), between R4 F/all and ASD (r= -0.970, 
P— 0.000) (Table 6). On the basis of the above results, we can 
hypothesize divergence at the amino acid level of less well 
conserved PCGs is due to higher A+T at Po F d in those genes and/ 
or lower A+T at P 4F d- On the basis of this result, when we choose 
which PCGs are used to analyze phylogenic relationships for 
different evolutionary time scales, the A+T content of P 0F d and/or 
P 4F d of PCGs could be useful to judge the homogenesis of PCGs. 

Nucleotide substitution is considered to be a reflection of 
evolution at the molecular level. Many earlier studies indicated the 
substitution was directional bias across different genes in the 
mitogenome [15]. Some researchers have proposed variation of 
A+T% among taxa is associated with directional mutation 
pressure and has a phylogenetic component [57,60,61]. In this 
study, with the exception oi nad4, all PCGs had significantly lower 
variation of A+T content among the ten tephritid species at P uF d 
compared to both P 2F d an d P 4F d sites. We observed that, with the 
exception of nad4, P 0F d sites had lower nucleotide substitution 
frequency compared to both P 2F d an d P 4F d sites (Fig. 3). The Poft> 
of nad4 had a higher nucleotide substitution frequency (0.783) 
compared to both P 2FD (0.172) and P 4FD (0.004), and the RoF/aii 
was 0.936. As a result of functional constraints, the number of 
nucleotide substitution per non-synonymous site is usually lower 
than that per synonymous site [62]. In this study, a higher 
nucleotide substitution frequency at Pofd of nad4 indicates the 
non-synonymous nucleotide substitution frequency was higher 
compared to the synonymous sites for this gene. Higher number of 
nucleotide substitution per non-synonymous site has been 
observed at the variable-region genes of immunoglobulins [63] 



and some genes of the histocompatibility complex [64] but this is 
the first reported occurrence in the mitogenome. 

3. Protein-coding genes 

With the exception of coxl and nad3, all protein coding genes 
start with an ATN codon, with ATG used in cox2, atp6, cox3, nad4, 
nad4l, nad6 and cob, ATT in nad2, atpS and nad5 and ATA in nadl. 
Genes for coxl and nad3 used TCG and GTC as initiation codons, 
respectively. The initiation codon for coxl was TCG(S) in B. minax, 
which was observed in other Diptera species [54] . GTC being the 
initiation codon for nad3 was a new observation in tephritids, but it 
is common in other insects [65] . 

With the exception of nad3, nad5 and nadl, all PCGs are 
terminated by complete stop codons: TAG is used for nad2, atp6 
and cob, TAA is used for cox2, atp8, cox3, nad4, nad4l and nad6 and 
TA is used for coxl. The remaining genes, nad3, nad5 and nadl, are 
terminated by incomplete stop codons "T". 

4. Transfer RNA genes, ribosomal RNA genes and tRNA- 
like structure 

All of 22 tPvNA genes typical of metazoan mitogenomes were 
identified in the B. minax mitogenome, and the predicted structures 
are shown in Fig. 4. All tRNAs display a typical clover-leaf 
secondary structure, except for tmtf , where the DHU arm 
appears to be replaced by seven unpaired nucleotides, a feature 
typical of other animal mitochondria [66]. Nuclear magnetic 
resonance analysis of the tertiary structure of nematode trnS^ AGN ^ 
suggested such aberrant tRNA can fit the ribosome by adjusting its 
structural conformation and function in a way similar to that of 
usual tRNAs in the ribosome [67]. 

Like most insect tRNAs, all B. minax tRNAs have a length of 
7 bp for the anticodon loop, 7 bp for the acceptor stem and 5 bp 
for anticodon stem. Most of the size variability in the B. minax 
tRNA genes originated from length variation in the DHU arms 
(loop size 4-9 bp, stem size 3-4 bp) and the T*PC arms (loop size 
2-9 bp, stem size 3-5 bp); in addition, trnA and tmH contained U- 
U mismatches. trn^ lJCN> encodes an A-C mismatch, tmH encodes 



Table 7. Locations, length and sequences of four shorter intergenic spacers in 10 tephritid species, 6. o/eoe, B. tryoni, B. 
philippinensis, B. carambolae, B. papaya, B. dorsalis, C. capitata, B. minax, B. correcta and B. curcubitae. 



Species 


tRNA GIU - tRNA phe 


ND5- tRNA His , 






tRNA Serlucm - ND1 




ND1 -tRNA Leu!CUN > 


Sequence 


Size (bp) Sequence 




Size (bp) 


Sequence 


Size (bp) 


Sequence 


Size (bp) 


B. minax 


ACTAATTACAATTCACTA 


18 TGATATATATT1 


"CA 


14 


TACTAAATATAATTAC 


16 


AAAAAACAAG 


10 


B. oleae 


ACTAAAATAAATACACTA 


18 TGATAAATACTTCAC 


15 


TACTAAATAAAATTA 


15 


AAAAAACAAG 


10 


B. tryoni 


ACTAAATGGAATACACTA 


18 TGACAAATATT 


TCAC 


15 


TACTAAATTTTATTA 


15 


AAAAAACAAG 


10 


B. dorsalis 


ACTAAATATAATACACTA 


18 TGATAAATATT 


rCAC 


15 


TACTAAATTCTATTA 


15 


AAAAAACAAG 


10 


B. philippinensis 


ACTAAATATAATGCACTA 


18 TGATAAATATT 


rCAC 


15 


TACTAAATTTTATTA 


15 


AAAAAACAAG 


10 


B. carambolae 


ACTAAATATAATACACTA 


18 TGATAAATATT 


rcAC 


15 


TACTAAATTTTATTA 


15 


AAAAAACAAG 


10 


B. papayae 


ACTAAATATAATACACTA 


18 TGATAAATATT 


rcAC 


15 


TACTAAATTTTATTA 


15 


AAAAAACAAG 


10 


B. correcta 


ACTAAATTTTATACACTA 


18 TGATAAATATT 


rcAC 


15 


TACTAAATTATATTA 


15 


AAAAAACAAG 


10 


B. curcubitae 


ACTAAATATAATTCACTA 


18 TGATAAATATT 


rcAC 


15 


TACTAATTTTTATTA 


15 


AAAAAACAAG 


10 


C. capitata 


ACTAAAAATAATTAACTA 


18 TGATAAATAAT 


TTTTCAC 


18 


TACTAAAATTAATTAA 


16 


TAAAAACAAG 


10 



doi:1 0.1 371 /journal.pone.01 00558.t007 
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B. carambolae 

B. dorsalis 

B. philippinensis 

B . papayae 

B. tryoni 

B. cor recta 
8. oleae 

C. capicaca 
B.cucurbitae 
B.roinax 



TC A ATATATATCGTC A ATTT AC ATTC ATATTTTTTTTTTTTTTTTTTTTTT- — A AT ATA 
TCAATGTATGTGGTGAATTTACATTCATATTTTTTTTTTTTTTTTTTTTTT— AATCTA 
TCAATATATGTGGTGAATTT AC ATTC ATATTTTTTTTTTTTTTTTTTTTTT- - - AATCTA 
TC A ATATATATGGTG A ATTT AC ATTC ATATTTTTTTTTTTTTTTTTTTTTTT- - AATCTA 
TCCATCC AT ATGTTG A ATTT AC ATTC ATATTTTTTTTTTTTTTTTTTTTTTT-- AATCTA 
TTTTTTTA-ATAATGAAATTAAGATCAAAATTTTTTTTTTTTTTTTTTTTTT — AATCTA 
TTATTACGTATATATAATTTTAATTCATAATTTTTTTTTTTTTTTTTTTTTTTAGGATTC 

AAAATAAAAAAATTGTAATTTAATTATTATTTCTTTTTTTTTTTTTTTTTTT TTCTA 

AAAAAAAATTAAATTAATTTTAATTAAAACCTTTTTTTTTTTTTTTTTTTT— AATCTA 
TGGGTTCC-CCAGGGGAATGGGATTCAAATTTTTTTTTTTTTTTTTTTTTTTTCCAATCC 
* • * * * ****************** 



Figure 5. Alignment of the poly-thymidine stretch at the 5' end of the control region described by Zhang et al. (1997) among 10 
tephritid species, B. oleae, B. tryoni, B. philippinensis, B. carambolae, B. papaya, B. dorsalis, C. capitata, B. minax, B. correcta and B. 
curcubitae. The poly-T stretch runs from nucleotide positions from 1 5974 to 1 5997 with respect to the B. minax mitogenome in the direction of 5'-3'. 
doi:1 0.1 371 /journal.pone.01 00558.g005 



an A-G mismatch and trnRtrtiR has a U-C mismatch in the 
acceptor stem. Additionally, trnV contains a U-U mismatch in the 
TTC stem. 

Anticodon sequences were the same as in B. dorsalis, B. oleae, B. 
tryoni and C. capitata, which are considered common for other 
insects, including Gryllotalpa orientalis [68], Philaenus spumarius [35], 
Phthonandria atrilineata [50] and Artogeia melete [36]. 

On the basis of the sequence similarity of B. dorsalis, the two 
genes coding for the small and the large ribosomal subunits were 
located in the B. minax mitogenome between tmL! CUN> and trnV and 
between to F and the CR region. The length of A minax rrnS and 
rrnL was 782 bp and 1333 bp, respectively, similar to B. dorsalis, B. 
oleae and C. capitata. 

5. Intergenic spacers 

In B. minax, the two longest intergenic spacers were 42 bp 
between trnC and trnT and 28 bp between trnR and trnN. In B. 
dorsalis, the second longest intergenic spacer was 45 bp between 
trnC and trnT. In B. tryoni, the second longest intergenic spacer was 
33 bp between trnR and trnN and the third longest intergenic 
spacer was 30 bp between trnC and trnT. In B. oleae, the longest 
intergenic spacer was 28 bp between trnR and trnN. In B. minax, 
however, only a 10 bp intergenic spacer was observed between 
trnQand trnM, which is shorter compared to 66 bp in B. dorsalis, 
71 bp in B. tryoni and 47 bp in C. capitata at the same location. Yu 
et al. [48] reported the 45 bp intergenic spacer located between 
trnC and trnT in B. dorsalis had a clear counterpart in the CR with 
the first 33 of 45 bp matching. These counterparts were predicted 
to form a small internal stem and a long stem structure pairing 
with the partially complementary sequence in the CR. A similar 
phenomenon was observed in the B. tryoni mitogenome, where 
both the second longest (33 bp between trnR and trnN) and the 
third longest intergenic spacer (30 bp between tmC and trnT) have 
clear counterparts (32 out of 33 bases and 25 out of 30 bases, 
respectively) on the N-strand of the CR. These two intergenic 
spacers have highly significant similarity and their counterparts 
were located in the same position of the CR. We asked whether 
the 42 bp intergenic spacer located between tmC and trnT in B. 
minax had these features. The first 15/42 bp of the spacer have a 
clear counterpart in the CR at positions 15,670-15,684. The 
42 bp of intergenic spacer was predicted to form two stem-loop 
secondary structures with 4 bp loops and one with a 3 bp stem 
and the other with a 4 bp stem. The first 15 of the 42 bp formed 
one of the two structures; a 4 bp stem with a 4 bp loop and a 3 bp 
flanking sequence. The counterpart in the CR also formed a long 
stem structure with the neighboring sequence. Yu et al. [48] 



compared the 33 bp counterpart in the CR from B. dorsalis with 
the B. oleae CR and found 25 of the 33 bp were identical. 
Surprisingly, of the original 33 bases present in the B. minax CR, 
23 were identical. Therefore, the results obtained in this study 
support the hypothesis that the secondary structures of the 
counterparts in both the intergenic spacer and the CR might 
have a major role in recombination [48,69] . 

The four intergenic spacers in B. minax, ISS-1 (18 bp between 
trnE and trnF), ISS-2 (14 bp between nad5 and tRNA H "), ISS-3 
(16 bp between trntf uaf ^ and nadl) and ISS-4 (10 bp between nadl 
and tmlF"*), were observed to be of similar size in the tephritids 

B. dorsalis, B. philippinensis, B. carambolae, B. papayae, B. correcta, B. 
cucurbitae, B. oleae and B. tryoni (18 bp, 15 bp, 15 bp and 10 bp) and 

C. capitata (18 bp, 18 bp, 16 bp and 10 bp) at the same locations. 
All intergenic spacers were found at the same locations and have 
highly significant similarity in percentage identity (71.4—100%; 
Table 7). 

Additionally, all four intergenic spacers have clear counterparts 
in the CR of the ten tephritid species (data not shown) but these 
intergenic spacers cannot form the secondary structures (even 
though some can be predicted to form stem-loop structures with 
2-3 bp stems). Some earlier studies focused on longer intergenic 
spacers with potential secondary structure and tried to find 
original sequences and structures in the CR [48] . Even among the 
close tephritid species, however, these longer intergenic spacers 
had significantly different features, including sequence, length and 
location. Cameron et al. [70] suggested the possibility that stem- 
loop structures instead of tRNAs in the 3' end of PCGs enhance 
the rearrangement. Two of four small intergenic spaces locate the 
3' end of PCGs without forming stem-loop structures. These 
results might explain why no rearrangement was found in tephritid 
species. This is the first report of shorter intergenic spacers with 
highly conserved sequences and locations among four tephritid 
species, which should attract more attention to the shorter 
intergenic spacers, even though the functions of these are not clear. 

6. CR 

The CR has a high A+T content among the mitochondrial 
genes of both vertebrates and invertebrates, and the initiation of 
replication is one of the most interesting features of this region [8] . 
Zhang and Hewitt [71] proposed conserved structural features on 
the basis of comparison of the CRs of one dipteran and two 
orthopteran species. These features include: (1) a poly(T) stretch at 
the 5' end of the CR; (2) a [TA(A)]„-like stretch after the poly(T) 
stretch; (3) a highly conserved stem-loop structure; (4) a stem-loop 
structure with a highly conserved flanking sequence of a TATA 
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consensus at the 5' end and a G(A)„T consensus at the 3' end; and 
(5) a G+A-rich sequence downstream of the secondary structure. 
The B. minax CR was found to have three of the five features 
proposed by Zhang and Hewitt [71]. 

The GR from four tephritid species, including B. minax, 
presented a conspicuous poly(T) stretch at the 5' end. This 
sequence stretch has been found to be conserved within 
hymenoptera [49] . Further, the poly(T) stretch has been observed 
to be followed by a [TA(A)] „-like stretch (Fig. 5). Our results 
suggest that this poly(T) region might be involved in the control of 
transcription and/ or replication, or have some other unknown 
functions [10]. Additionally, a highly conserved G+A-rich 
sequence block was found in front of the poly(T) stretch among 
the four tephritid species and these sequences can be predicted to 
form secondary structures with a stem-loop. The highly conserved 
G+A-rich sequence with a poly(T) stretch nearby has been found 
in other dipteran and orthopteran species [71]. 

In the B. minax CR, more than ten sequences have the potential 
to form stem-loop structures with perfect matches and loops of 
variable size. In addition, several other stem-loop structures with 
some mismatch in the stems can be predicted. However, obvious 
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stem-loop structures with conserved flanking sequences were not 
found in the CR of these ten tephritid species. In addition, The B. 
minax CR does not contain any tRNA-like sequence, but contains 
two tandem repeats ranging in size from 33 to 45 bp. The 
sequence TATTAATTTTATTAAA occurred twice and the 
sequence CCTTTTAAATTTTCC occurred three times. The 
two repeats were located at positions from 15,325 to 15,357 and 
from 15,858 to 15,903, respectively. For other tephritid species, we 
found one tandem repeat in the CR of B. doraslis, B. correcta, B. 
curcubitae and C. capitata, two in B. philippinensis and B. carambolae, 
three in B. oleae and B. papaya but none in B. tryoni. 
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